Search CORE

20 research outputs found

Interactive visualisation and exploration of biological data

Author: Gilbert D
Helden JV
Schroeder M
Publication venue
Publication date: 01/01/2000
Field of study

International audienceno abstrac

HAL AMU

Brunel University Research Archive

Representing and analysing molecular and cellular function in the computer

Author: Eldridge M
Gilbert D
Helden JV
Mancuso R
Naim A
Wernisch L
Wodak SJ
Publication venue: 'American Society for Biochemistry & Molecular Biology (ASBMB)'
Publication date: 01/01/2000
Field of study

Determining the biological function of a myriad of genes, and understanding how they interact to yield a living cell, is the major challenge of the post genome-sequencing era. The complexity of biological systems is such that this cannot be envisaged without the help of powerful computer systems capable of representing and analysing the intricate networks of physical and functional interactions between the different cellular components. In this review we try to provide the reader with an appreciation of where we stand in this regard. We discuss some of the inherent problems in describing the different facets of biological function, give an overview of how information on function is currently represented in the major biological databases, and describe different systems for organising and categorising the functions of gene products. In a second part, we present a new general data model, currently under development, which describes information on molecular function and cellular processes in a rigorous manner. The model is capable of representing a large variety of biochemical processes, including metabolic pathways, regulation of gene expression and signal transduction. It also incorporates taxonomies for categorising molecular entities, interactions and processes, and it offers means of viewing the information at different levels of resolution, and dealing with incomplete knowledge. The data model has been implemented in the database on protein function and cellular processes 'aMAZE' (http://www.ebi.ac.uk/research/pfbp/), which presently covers metabolic pathways and their regulation. Several tools for querying, displaying, and performing analyses on such pathways are briefly described in order to illustrate the practical applications enabled by the model

HAL AMU

DI-fusion

Brunel University Research Archive

Fast algorithms for computing sequence distances by exhaustive substring composition

Author: A Apostolico
A Kolmogorov
A Lempel
Alberto Apostolico
B Blaidsell
B Hao
H Otu
I Ulitsky
J Na
J Qi
JV Helden
L Brillouin
LL Gatlin
M Höhl
M Li
Olgert Denas
P Ferragina
R Edgar
R von Mises
S Vinga
TJ Wu
TM Cover
Publication venue: BioMed Central
Publication date: 01/10/2008
Field of study

The increasing throughput of sequencing raises growing needs for methods of sequence analysis and comparison on a genomic scale, notably, in connection with phylogenetic tree reconstruction. Such needs are hardly fulfilled by the more traditional measures of sequence similarity and distance, like string edit and gene rearrangement, due to a mixture of epistemological and computational problems. Alternative measures, based on the subword composition of sequences, have emerged in recent years and proved to be both fast and effective in a variety of tested cases. The common denominator of such measures is an underlying information theoretic notion of relative compressibility. Their viability depends critically on computational cost. The present paper describes as a paradigm the extension and efficient implementation of one of the methods in this class. The method is based on the comparison of the frequencies of all subwords in the two input sequences, where frequencies are suitably adjusted to take into account the statistical background

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Frequency distribution of TATA Box and extension sequences on human promoters

Author: A O'Shea-Greenfield
C Martins
CE Lawrence
E Eisenberg
J Wong
JA Warrington
JL Kim
JV Helden
L Xu
LL Hsiao
M Molina
MG Tadesse
P Bucher
PC FitzGerald
R Development Core Team
S Dorus
S Takeda
ST Smale
Wanlei Zhou
Wei Shi
X Xie
Y Suzuki
YC Kim
ZS Juo
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: TATA box is one of the most important transcription factor binding sites. But the exact sequences of TATA box are still not very clear. RESULTS: In this study, we conduct a dedicated analysis on the frequency distribution of TATA Box and its extension sequences on human promoters. Sixteen TATA elements derived from the TATA Box motif, TATAWAWN, are classified into three distribution patterns: peak, bottom-peak, and bottom. Fourteen TATA extension sequences are predicted to be the new TATA Box elements due to their high motif factors, which indicate their statistical significance. Statistical analysis on the promoters of mice, zebrafish and drosophila melanogaster verifies seven of these elements. It is also observed that the distribution of TATA elements on the promoters of housekeeping genes are very similar with their distribution on the promoters of tissue specific genes in human. CONCLUSION: The dedicated statistical analysis on TATA box and its extension sequences yields new TATA elements. The statistical significance of these elements has been verified on random data sets by calculating their p values

Deakin Research Online

Crossref

Springer - Publisher Connector

PubMed Central

Clusters of Conserved Beta Cell Marker Genes for Assessment of Beta Cell Phenotype

The aim of this study was to establish a gene expression blueprint of pancreatic beta cells conserved from rodents to humans and to evaluate its applicability to assess shifts in the beta cell differentiated state. Genome-wide mRNA expression profiles of isolated beta cells were compared to those of a large panel of other tissue and cell types, and transcripts with beta cell-abundant and -selective expression were identified. Iteration of this analysis in mouse, rat and human tissues generated a panel of conserved beta cell biomarkers. This panel was then used to compare isolated versus laser capture microdissected beta cells, monitor adaptations of the beta cell phenotype to fasting, and retrieve possible conserved transcriptional regulators.Journal ArticleSCOPUS: ar.jinfo:eu-repo/semantics/publishe

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Copenhagen University Research Information System

DI-fusion

Identification of gene targets against dormant phase Mycobacterium tuberculosis infections

Author: A Jindani
A Marchler-Bauer
AE Choudhry
AM Stock
AM Talaat
B Hutter
B Hutter
C Boon
C Dye
CD Sohaskey
CD Sohaskey
CM Sassetti
CM Sassetti
D Avarbock
D SALKIN
D Schnappinger
DA Mitchison
DA Mitchison
DA Mitchison
DC Crick
Dennis J Murphy
DF Warner
DG Muttucumaru
DJ Payne
DK Saini
DK Saini
DK Saini
DM Roberts
DR Sherman
E Wooff
EA Weinstein
EJ Munoz-Elias
EJ Munoz-Elias
ER Rhoades
G Cappelli
G Kaplan
G Lamichhane
G Wisedchaisri
GR Stewart
H He
H Makinoshima
H Ohno
H Rachman
H Rachman
HD Park
HI Boshoff
HI Boshoff
I Weber
J Daniel
J DeMaio
J DeMaio
J Grosset
J Li
J Rengarajan
J Starck
James R Brown
JC Betts
JD MacMicking
JD McKinney
JF Barrett
JL Dahl
JN Stewart
JS Parkinson
JV HURFORD
K Andries
K Kvint
KH Darwin
L Shi
LG Wayne
LG Wayne
LG Wayne
LG Wayne
M Kanehisa
M Sun
MA Fabian
MG Erickson
MI Voskuil
MI Voskuil
MJ Macielag
MJ Pabst
MW Schelle
P Freestone
P Freestone
PC Karakousis
PD van Helden
PJ Brennan
PR Wheeler
R O'Toole
R Pinto
R Qamra
R Tatusov
RL Tatusov
S Hasan
S Mani Tripathi
S Rodrigue
SH Cho
SJ Williams
SL Kendall
SS Dawes
ST Cole
T Hampshire
T Lillebaek
T Nystrom
T Nystrom
TC Zahrt
TC Zahrt
TI Zarembinski
TP Primm
V Jain
V Malhotra
V Sharma
V Usha
VK Sambandamurthy
WF Loomis
WF Maragos
Y Hu
Y Zhang
Y Zhang
Y Zhang
YM Hu
Z Xie
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background <it>Mycobacterium tuberculosis</it>, the causative agent of tuberculosis (TB), infects approximately 2 billion people worldwide and is the leading cause of mortality due to infectious disease. Current TB therapy involves a regimen of four antibiotics taken over a six month period. Patient compliance, cost of drugs and increasing incidence of drug resistant <it>M. tuberculosis </it>strains have added urgency to the development of novel TB therapies. Eradication of TB is affected by the ability of the bacterium to survive up to decades in a dormant state primarily in hypoxic granulomas in the lung and to cause recurrent infections. Methods The availability of <it>M. tuberculosis </it>genome-wide DNA microarrays has lead to the publication of several gene expression studies under simulated dormancy conditions. However, no single model best replicates the conditions of human pathogenicity. In order to identify novel TB drug targets, we performed a meta-analysis of multiple published datasets from gene expression DNA microarray experiments that modeled infection leading to and including the dormant state, along with data from genome-wide insertional mutagenesis that examined gene essentiality. Results Based on the analysis of these data sets following normalization, several genome wide trends were identified and used to guide the selection of targets for therapeutic development. The trends included the significant up-regulation of genes controlled by <it>devR</it>, down-regulation of protein and ATP synthesis, and the adaptation of two-carbon metabolism to the hypoxic and nutrient limited environment of the granuloma. Promising targets for drug discovery were several regulatory elements (<it>devR/devS</it>, <it>relA</it>, <it>mprAB</it>), enzymes involved in redox balance and respiration, sulfur transport and fixation, pantothenate, isoprene, and NAD biosynthesis. The advantages and liabilities of each target are discussed in the context of enzymology, bacterial pathways, target tractability, and drug development. Conclusion Based on our bioinformatics analysis and additional discussion of in-depth biological rationale, several novel anti-TB targets have been proposed as potential opportunities to improve present therapeutic treatments for this disease.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A complete workflow for the analysis of full-size ChIP-seq (and similar) data sets using peak-motifs.

Author: A Medina-Rivera
A Tsurumi
A Valouev
B Langmead
Carl Herrmann
CM Bergman
Denis Thieffry
DS Johnson
E Mercier
E Portales-Casamar
E Wingender
Elodie Darbo
G Robertson
IV Kulakovskiy
J Goecks
J van Helden
J van Helden
J van Helden
Jacques van Helden
JD McPherson
JR Sanford
JS Kanodia
JV Turatsinze
L Kuttippurathu
LJ Zhu
M Defrance
M Salmon-Divon
M Thomas-Chollier
M Thomas-Chollier
Matthieu Defrance
MJ Blow
MJ Fullwood
MM Harrison
Morgane Thomas-Chollier
MS Cline
N Rusk
O Sand
P Agius
P Flicek
P Machanick
PA Fujita
RK Bradley
S Gama-Castro
S Pepke
SJ van Heeringen
T Barrett
TI Lee
TL Bailey
V Boeva
X Chen
Y Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

This protocol explains how to use the online integrated pipeline 'peak-motifs' (http://rsat.ulb.ac.be/rsat/) to predict motifs and binding sites in full-size peak sets obtained by chromatin immunoprecipitation-sequencing (ChIP-seq) or related technologies. The workflow combines four time- and memory-efficient motif discovery algorithms to extract significant motifs from the sequences. Discovered motifs are compared with databases of known motifs to identify potentially bound transcription factors. Sequences are scanned to predict transcription factor binding sites and analyze their enrichment and positional distribution relative to peak centers. Peaks and binding sites are exported as BED tracks that can be uploaded into the University of California Santa Cruz (UCSC) genome browser for visualization in the genomic context. This protocol is illustrated with the analysis of a set of 6,000 peaks (8 Mb in total) bound by the Drosophila transcription factor Krüppel. The complete workflow is achieved in about 25 min of computational time on the Regulatory Sequence Analysis Tools (RSAT) Web server. This protocol can be followed in about 1 h.Journal ArticleResearch Support, Non-U.S. Gov'tSCOPUS: ar.jinfo:eu-repo/semantics/publishe